Exploring bigram character features for Arabic text clustering
نویسندگان
چکیده
منابع مشابه
Identification of arabic word from bilingual text using character features
The identification of the language of the script is an important stage in the process of recognition of the writing. There are several works in this research area, which treat various languages. Most of the used methods are global or statistical. In this present paper, we study the possibility of using the features of scripts to identify the language. The identification of the language of the s...
متن کاملOn Optical Character Recognition of Arabic Text
Although, optical character recognition has made tremendous achievements in the area of desktop publishing, yet a huge amount of work is required to be done. Unlike Roman like languages, there are various languages possessing a large number of fonts and/or having complicated shapes. Arabic language is one of those languages, which is somewhat complicated in its construction. Although a reasonab...
متن کاملShallow Text Clustering Does Not Mean Weak Topics: How Topic Identification Can Leverage Bigram Features
Text clustering and topic learning are two closely related tasks. In this paper, we show that the topics can be learnt without the absolute need of an exact categorization. In particular, the experiments performed on two real case studies with a vocabulary based on bigram features lead to extracting readable topics that cover most of the documents. Precision at 10 is up to 74% for a dataset of ...
متن کاملArabic Text Summerization Model Using Clustering Techniques
the current work investigates a developed automatic Arabic text summarization model. In this model, a technique of word root clustering is used as the major activity. Unlike the previously presented systems of Arabic text summarization in the extract based design field, the current model adopts cluster weight of word roots instead of the word weight itself. The model is thoroughly illustrated t...
متن کاملHigh capacity steganography tool for Arabic text using 'Kashida'
Steganography is the ability to hide secret information in a cover-media such as sound, pictures and text. A new approach is proposed to hide a secret into Arabic text cover media using "Kashida", an Arabic extension character. The proposed approach is an attempt to maximize the use of "Kashida" to hide more information in Arabic text cover-media. To approach this, some algorithms have been des...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
سال: 2019
ISSN: 1303-6203
DOI: 10.3906/elk-1808-103